The Mean Partition Theorem of Consensus Clustering
نویسنده
چکیده
To devise efficient solutions for approximating a mean partition in consensus clustering, Dimitriadou et al. [3] presented a necessary condition of optimality for a consensus function based on least square distances. We show that their result is pivotal for deriving interesting properties of consensus clustering beyond optimization. For this, we present the necessary condition of optimality in a slightly stronger form in terms of the Mean Partition Theorem and extend it to the Expected Partition Theorem. To underpin its versatility, we show three examples that apply the Mean Partition Theorem: (i) equivalence of the mean partition and optimal multiple alignment, (ii) construction of profiles and motifs, and (iii) relationship between consensus clustering and cluster stability.
منابع مشابه
Condorcet's Jury Theorem for Consensus Clustering and its Implications for Diversity
Condorcet’s Jury Theorem has been invoked for ensemble classifiers to indicate that the combination of many classifiers can have better predictive performance than a single classifier. Such a theoretical underpinning is unknown for consensus clustering. This article extends Condorcet’s Jury Theorem to the mean partition approach under the additional assumptions that a unique ground-truth partit...
متن کاملAsymptotic Behavior of Mean Partitions in Consensus Clustering
Although consistency is a minimum requirement of any estimator, little is known about consistency of the mean partition approach in consensus clustering. This contribution studies the asymptotic behavior of mean partitions. We show that under normal assumptions, the mean partition approach is consistent and asymptotic normal. To derive both results, we represent partitions as points of some geo...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملConsensus clustering by graph based approach
In this paper, we propose G-Cons, an extension of a graph minimal coloring paradigm for consensus clustering. Based on the coassociation values between data, our approach is a graph partitioning one which yields a combined partition by maximizing an objective function given by the average mutual information between the consensus partition and all initial combined clusterings. It exhibits more i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1604.06626 شماره
صفحات -
تاریخ انتشار 2016